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Coalescence-fragmentation problems are now of great interest across the physical, bio- 
logical, and social sciences. They are typically studied from the perspective of rate equa- 
tions, at the heart of which are the rules used for coalescence and fragmentation. Here we 
discuss how changes in these microscopic rules affect the macroscopic cluster-size distri- 
bution which emerges from the solution to the rate equation. Our analysis elucidates the 
crucial role that the fragmentation rule can play in such dynamical grouping models. We 
focus our discussion on two well-known models whose fragmentation rules lie at opposite 
extremes. In particular, we provide a range of generalizations and new analytic results for 
the well-known model of social group formation developed by Eguiiuz and Zimmermann 
[V. M. Eguiiuz and M. G. Zimmermann, Phys. Rev. Lett. 85, 5659 (2000)]. We develop 
analytic perturbation treatments of this original model, and extend the analytic analysis to 
the treatment of growing and declining populations. 



I. INTRODUCTION 



The challenge to understand the dynamics of Complex Systems is attracting increasing at- 
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tention, particularly in the socio-economic and biological domains llt|2|,l3L|^15|,16|,LZLlEI9L 1 1QL 
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For example, the recent turmoil in the financial mar- 
kets has created significant public speculation as to the root cause of the observed fluctuations. 
At their heart, all Complex Systems share the common property of featuring many interacting 
objects from which the observed macroscopic features emerge. Exactly how this happens can- 
not yet be specified in a generic way - however, an important milestone in this endeavor is to 
develop a quantitative understanding of any internal clustering dynamics within the population. 
Coalescence-fragmentation processes have been studied widely in conventional chemistry and 
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however, collective behavior in social systems is not limited by nearest neighbor interactions, nor 
are the details of social coalescence or fragmentation processes necessarily the same as in physical 
and biological systems. The challenge for a theorist is then twofold: (1) to provide a model which 
accounts correctly for the observed real-world behavior — e.g., in the case that power-laws are 
observed empirically, the model should be able to reproduce the power-law dependence itself, the 
value of the corresponding power-law exponent, and possibly also the form of the truncation; (2) 
the rules invoked in the model need to make sense in the context of the real-world system being 
discussed. 

In this paper, we discuss coalescence and fragmentation problems with a focus on social sys- 
tems. In particular, we consider interactions which are essentially independent of spatial separation 
in order to mimic the effect of modern communications etc. Much of our discussion is focused 
around fragmentation processes in which an entire cluster breaks up into its individual pieces - 
thereby mimicking a social group disbanding - as opposed to the more typical case studied in 
physical and biological systems of binary splitting. We limit our discussion to the steady-state 
behavior corresponding to a constant population, or a steadily growing/declining population. In 
Sec. II BL we lay out a general formulation of such coalescence-fragmentation problems. In or- 
der to understand the quantitative effects of a particular choice of fragmentation rule, Sec. HI] then 
compares two well known coalescence-fragmentation models, with fragmentation rules which lie 
at opposite extremes of the spectrum. One of these is the well-known physics-inspired model of 
social group formation introduced by Egufluz and Zimmermann [2] while the other is a standard 
model in mathematical ecology due to Gueron and Levin QVD - The explicit comparison between 
the two models allows us to elucidate the subtle differences in their microscopic rules that make 
their macroscopic distributions differ, and leads us to a better generic understanding of the crucial 
role that the fragmentation rule can play. We then proceed to focus on the physics-based model 
of Egufluz and Zimmermann, generalizing it in several ways and providing new analytic results 
(Sec. Hill). We analyze a perturbed version of the Egufluz-Zimmerman model where spontaneous 
cluster formation is present (Sec. lIII Aj) . as well as generalized versions in which there is a steadily 
growing (Sec. lIIICTI) or declining population (Sec. lIIIC2l) . Further realistic modifications of the 
Egufluz-Zimmerman model are discussed in Sec. IIII Dl 

There is of course a huge volume of work in the mathematics, physics and chemistry literature 
on the topic of clustering within a many -body population of interacting particles [43]. The Smolu- 
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chowski coalescence equation is arguably the most famous and well-studied example 11211. 122l 12311 . 
Reference 114311 provides an excellent recent review of coalescence-fragmentation models in physi- 
cal and chemical systems from a mathematician's perspective - however we note that the socially- 
inspired models that we focus upon in this article are not discussed. Many previous studies have 



tended to focus on generic mathematical 



gelation and finite size effects (see Refsi24, 



issues such as existence, uniqueness, mass conservation, 
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2811 and references therein). When it comes 



to Complex Systems - and in particular, social systems - the more pressing goal is to understand 
the emergent features of the population. In contrast to physical and chemical systems in which 
collision energetics play a crucial role in guiding the specification of microscopic coalescence and 
fragmentation rules, the precise microscopic rules in social systems are unknown - however, the 
overall macroscopic emergent phenomena such as cluster size distribution can be measured rela- 
tively easily. In financial markets, the collective dynamics of the population of traders is registered 
directly by means of the price. Indeed, as many prior works have shown, such collective behavior 
in social systems tends to produce near scale-free (i.e. power-law) networks and/or cluster sizes 
in a variety of real- world situations. For example, the distribution of transaction sizes follows a 
power-law with slope near 2.5 for each of the three major stock exchanges in New York, Paris 
and London [44]. In addition, it has been shown that the distribution of the severity of violent 
events inflicted in conflict by insurgent groups, and by terrorist groups, follow a power-law near 
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45h . The model of Egufluz and Zimmermann yfl, which is the starting point of much 



of the paper's discussion, is therefore an attractive candidate model for such social systems. In 
addition to its intrinsic theoretical interest because of its non-binary fragmentation rule, which 
mimics the disbanding of socialgroups, it also happens to produce a robust power-law distribution 
of cluster sizes with slope 2.5 |2|]. 



A. Modeling Social Systems 



Many social systems seem to comprise a large number of dynamically evolving clusters. 
Over time, and in an apparently self-organized way, clusters either coalesce with each other 
to form even larger clusters, or fragment to form a collection of smaller ones. In addition 
to everyday social situations, these characteristics seem consistent with common sense no- 
tions of the dynamical connectivity within a community of financial traders [|2j], or even a 
loosely connected insurgent population or terrorist/criminal network [|4j, |6J]. Figure CD illustrates 



4 



cluster i may fragment 



cluster i 
size s. 



| clusters may 



coalesce 



FIG. 1 : Schematic diagram indicating the presence of coalescence and fragmentation processes, 
for a population of N = 15 objects dynamically partitioned into clusters. The size of cluster i is 
Si = 2, while the size of cluster j is Sj = 6 etc. The fragmentation process exhibits the richest 
range of possibilities, given the combinatorial number of ways in which a cluster can in principle 
be divided. There are many possible realizations of the objects themselves, e.g. humans, animals, 
macromolecules, though for simplicity we show them as humans. 



the generic situation of interest in many recent works on coalescence-fragmentation models 
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As a result of coalescence and frag- 



mentation processes over time, the population of N objects undergoes dynamical partitioning into 
clusters i, j, k, . . . of size Sj, Sj, Sk, . ■ ■ , where both the number of clusters and their membership 
are typically time-dependent. We have denoted the iV objects in human form, but of course they 
could be animals, macromolecules or other indivisible entities. Earlier studies tended to focus 
on situations in which the interactions between clusters might be expected to decay with physical 
separation - as in a simple solution of molecules interacting through Van der Waals interactions 
for example. However in modern-day social applications, where long-distance communication is 
as commonplace as communication with neighbors, it makes more sense to have interactions over 
all lengthscales, with the interaction probability effectively independent of physical separation. 
These are the type of interactions that we discuss here. 

Of the two processes in Fig. 1, i.e. coalescence and fragmentation, the coalescence process is 
likely to be the simpler and more generic. Suppose we have a particular partition of a population 
of N objects into clusters as in Fig. 1, and that a cluster i of size s« = 2 is to coalesce. It is unlikely 
to undergo three-body collisions and/or interactions, and hence its most likely coalescence event 
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is to join with a single other cluster j. Given that the size of a cluster measures the number of 
objects in it, it is therefore reasonable to imagine that the coalescence probability should increase 
as the size of the clusters themselves increase. In a more human setting, the more objects that a 
cluster contains, the more likely it is that something will happen to one of its members in order 
to induce such an event, and hence the probability will increase with the size of the cluster. We 
therefore adopt size-dependent coalescence probabilities in this work. We note that although we 
are using the term 'cluster' throughout this paper for convenience, it can also be taken to mean a 
'community' in the language of network science ll54ll since it denotes a subset of the population 
who have very strong links between them, while the links between clusters are negligibly weak. 
We note also that the term 'cluster' need not necessarily mean physical connection - instead it 
could represent a group of objects whose actions happen to be coordinated in some way. Hence 
the coalescing of two clusters, however distant in real space, can mean an instantaneous alignment 
of their coordinated activities, as one might expect in a financial market[l3Q, organized crime or 
insurgent warfare! 4, f]]. In such a situation, a common fragmentation event would then likely be 
a sudden disruption of this coordination - hence it is this type of fragmentation rule that forms 
the focus of our work. Although we do not explore the details of real-world applications such as 
financial markets or insurgent warfare here, it is useful to keep them in mind when we discuss the 
consequences of the different fragmentation rules later in the paper. 

As mentioned earlier, the distinct feature of many real- world systems is the existence of scale- 
free behavior in the time-averaged cluster size distribution^, [j, \4, Is B3, Ifia |470, such that in the 
first instance these systems can be characterized by the exponent of their power law and by the 
range of its scale-free behavior. One may therefore ask: Which ingredients of the coalescence- 
fragmentation models, or combinations of ingredients, turn out to control the various observable 
aspects? It is this general question that motivates the present work. 

B. General Formulation 

Once the probabilities specifying the coalescence and fragmentation are given, the cluster size 
distribution may be computed either by a direct simulation of the model or in a mean-field theory 
approximation by solving an appropriate set of rate equations, often numerically. The rate equa- 
tions are typically non-linear. The non-trivial question of existence and uniqueness of the time- 



independent solution therefore arises, and is addressed in seminal works such as Refs. ||42l.l5C 
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For the social/economical models of current interest, the uniqueness and existence can be shown 
at the level of the rate equations, and verified by direct simulations. We consider mostly 'steady- 
state' models, in which there is some form of robust long-time behavior. 

The number of clusters of size s at time t is n s (t), and N is the total number of members (i.e. 
the population size). We will drop the explicit time-dependence for simplicity, since it will be 
clear from the context whether we are discussing n s (t) or its steady-state time-averaged value. In 
order to characterize a general system, we need to prescribe the following two functions, each of 
dimension [time] -1 : 

• The coalescence function C(s,s') which is the rate describing the process by which two 
clusters of sizes s and s' merge. We only consider coalescence which depends on the details 
of a pair of clusters, and hence exclude the possibility that 3 (or more) clusters are involved 
in the merging process. 

• The fragmentation function jF7?.(s; mi , m 2 , . . . , m n — 1) which is the rate describing the pro- 
cess by which a cluster of size s fragments into a configuration which contains mi clusters 
of size 1, m 2 clusters of size 2, etc. 

The functional form of the above two functions is taken to be time-independent. If we consider 
general fragmentation processes, we see that a large number of parameters are necessary to char- 
acterize the fragmentation. However in order to write down the rate equations and hence calculate 
the cluster size distribution, we do not need complete knowledge of the fragmentation function 
(i.e. we do not need knowledge about all possible partitions). It is sufficient to know the reduced 
fragmentation function F(s, s', m), defined as the rate at which a cluster of size s fragments into 
a configuration which contains m clusters of size s plus any other clusters with sizes different to 
s 1 . In addition to s', m) we need to know the rate that the fragmentation of any given cluster 
of size s occurs, which we denote as f(s). In principle we can calculate it by summing the com- 
plete fragmentation function over all partitions of the fragmentation products. By prescribing the 
deduced fragmentation function J-(s, s', m) we do not characterize uniquely the fragmentation of 
the system and in general we may not be able to calculate f(s) - yet it is possible in specific cases 
to do so once the assumption regarding the fragmentation products has been stated. Looking at the 
average number of clusters of size s that in unit time undergo the various processes (see Fig. ©, 
we may introduce the following notation: 

• L F (s): loss due to fragmentation, the number of clusters of size s that fragment 
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L c (s): loss due to coalescence, the number of clusters of size s that join with other clusters 

Gc(s): gain from coalescence, the number of clusters of size s created from the merging of 
clusters of size smaller than s 

Gp(s): gain from fragmentation, the number of clusters of size s created from fragmenting 
clusters of size larger than s 
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FIG. 2: The various processes of cluster coalescence and fragmentation which give rise to L F , 
Lc, Gf, Gc for any particular value of s. The top figure represents the appearance of new 
clusters of size s, the bottom one represents their loss. In the interests of simplicity, the 
fragmentation into two clusters has been depicted and only a few processes are shown. 



Symbolically the rate equations for any s are written as 

On, 



dt 



-L F (s)-L c (s)+G c (s) + G F (s) 



(1) 



which explicitly reads as 



dn s 
dt 



N ^ s-1 N [N/s'] 

f(s)n s -n s 1 ^2n s/ C(s,s') + - ^ 7yn s - s /C(s', s - s') + n s > ^ mF{s',s,m) . 



s'=l 



s+1 m=l 



(2) 



The last term represents the gain in the number of clusters of size s coming from fragmentation 
of other clusters of size s' > s, in such a way that among the fragmentation products we have m 
clusters of size s. We are summing over all possible values of m and s'. Note that we sum over 
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s' which is here the first (not the second) argument of T . An explicit form for f(s) is discussed 
above. It is convenient to formally define 

[N/s'] 

!F(s,s') = mjF(s, s',m). (3) 

m=l 

We write therefore the last term of Eq. © as 

N 

ris'Fis', s). 

s'=s+l 

H. ROLE OF THE FRAGMENTATION FUNCTION 

A logical first step in the quest to understand classes of models which differ in their cluster frag- 
mentation process, is to look at extreme cases. One such case is the Egufiuz-Zimmermann (E-Z) 
model 020. In the E-Z model, fragmentation of a cluster of size s always produces s clusters of size 
1, i.e. the cluster breaks up into individual objects. At the other extreme, is the famous Gueron and 
Levin (G-L) model |7|] in which fragmentation of a cluster yields two smaller pieces, i.e. the orig- 
inal cluster splits into two clusters. The original G-L model is formulated in terms of continuous 
distributions - however, since our aim is to analyze the effects of these rules on the same footing, 
we will focus on the discrete version of the G-L model, returning to the continuous formulation 
later on. The G-L model is in fact identical to Smoluchow ski's coagulation-fragmentation model 
with binary fragmentation. 

The common feature of the models that we discuss, is the presence of a separable coalescence 
function: 

C(s,s') =aa{s)a{s') . (4) 

In principle, the multiplicative constant may be absorbed into a(s), however we prefer to keep it 
explicitly and adopt a dimensionless a(s). This class of model is further specified by introduc- 
ing a coalescence mechanism on the microscopic scale, namely that two clusters merge when any 
member from one cluster connects to any member from the other cluster. In a macroscopic de- 
scription, this is equivalent to assuming that a(s) = s. We note that Gueron and Levin[7], having 
the solution of the rate equations for a(s) = s, considered explicitly the other cases a(s) = 1 and 
a(s) = 1 / s by means of the substitution n s — ► a(s) n s - however, this substitution affects the form 
of the fragmentation function J-(s, s', m). 
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A. Fragmentation function 

Assuming that the cluster may only split into two pieces still does not uniquely specify the 
fragmentation, since we still need information about the probability distribution for the sizes of 
the fragments. In the G-L model, it is stated that the conditional distribution for fragments is 
uniform! 70, i.e. the fragmentation of a cluster occurs with a probability which is independent of 
the way in which the cluster breaks. The reduced fragmentation function for s > 1 is therefore 

Fgl{s, s', m) = (3 b(s) [2 <5 m ,i(l - 8 2s >, s ) + bmths'A (5) 

where we have accounted for the fact that if 2s' = s, the cluster breaks into two fragments of equal 
size. Using Eq. §3§ one obtains immediately 

F GL (s,s') = 2(3b(s) . (6) 

The fragmentation probability is calculated as follows: 

- 8—1 f— 1 

feds) = Fgl{s, s' J m = l) + J2 Fgl(s, s', m = 2) = (3 (s - 1) b(s) , (7) 

s'=l s'=l 

where the factor 1 /2 in the first term appears in order to avoid double-counting, and the second 
term represents splitting into two equal parts. In the E-Z fragmentation scheme, the cluster of size 
s can only break up into individual objects and there is only one mode of fragmentation, hence 

T E z{s, s', m) = (3 b(s)(l - 5 sl )5 slil 5 mtS . (8) 

Using Eq. © we have 

F E z(s,s') = l3sb(s)(l-6 8l )6 a , tl . (9) 
The fragmentation probability is 

s-1 

f EZ (s) = ^zis, s',m = s) = P(l- 5 sl )b(s). (10) 

s'=l 

There is no double-counting problem here. A peculiar feature of the E-Z model is that the corre- 
sponding set of rate equations is semi-recursive, i.e. any A;-th equation depends only on values of 
n s i for s' < k and on a global constant depending on all n s . This is a feature by which it is easy to 
show the existence and uniqueness of the solution and also to solve the system numerically. 

It is the common feature of both G-L and E-Z type models to assume that a(s) = b(s). Mathe- 
matically, this acts to restrict the space of all possible solutions, otherwise the diversity of general 
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solutions would be overwhelming. In physical terms, the justification for this assumption is that 
there is interest in the specific case a(s) = b(s) = s, since this describes the case of fragmentation 
of the cluster being triggered by a single member - hence the proportionality to s. Similarly the 
likelihood that two groups would become coordinated and hence act as a single unit (i.e. they 
coalesce) would be proportional to the size of each of the groups, if the underlying mechanism 
involved one member from each initiating the process by forming a link followed by all the other 
members. 

With the assumptions made so far, it turns out that each system is described by three constants: 
a, (3 and the total population size N. For the time-independent system we need just two constants, 
and since a and (3 are of dimension [time] -1 then only their ratio a/ (3 should appear. The steady- 
state rate equations are as follows. 

G-L system: 

N s-1 N 

—(3{s 2 — s)n s — a sn s s'n s i + — s' n s i (s — s')n s _ s > +2(3 s' n s t — . (11) 

s'=l s'=l s'=s+l 

E-Z system: 



N s-1 N 

-f3s (1 - 8 sl )n s - a s n s ^ s'n s > + — ^ s' n s > (s - s')n s - s > + (38 S)1 ^ s' 2 n s > = . (12) 

s'=l s'=l s'=s+l 



Egufluz and Zimmermann|2]] explicitly used the following constants: 

2(1 -u) v 

We see that both sets of equations (fTTI) and (TT2l) simplify if we express them in terms of k s = sn s , 
i.e. the number of agents contained in clusters of size s. Note that for general a(s), we need to 
substitute k q = a(s) n q . 



B. Equilibrium in Gueron-Levin model: Continuous formulation 

Gueron and Levin's solution [7] to the G-L model, was obtained for the system with continuous 
cluster density which we denote as n(s). In terms of k(s) = sn(s), the integral rate equation 
corresponding to Eq. CCD) with no limit on the maximum size of a cluster, is given by: 

poo i PS poo 

= -{3sk(s)-ak(s) ds k(s') + a- / ds'k(s') k(s - s') +2/3/ ds k{s) . (14) 

Jo 2 Jq J s 
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Looking at this equation, we guess that the solution is obtained by substituting an ansatz which 
satisfies k(s + s') oc k(s)k(s'). The first form to try is k(s) = A e~^ s . With this ansatz we obtain 

= -A f3 s e"^ - A 2 a/ne^ s + A 2 a/2 se~^ s + 2A{3/^ e~^ s . (15) 

There are two types of terms, of the form ~ e~ MS or ~ s e~^ s . Eliminating the overall exponential 
factor we have 

= .(-^ + A'H) + l(^-A'H). (16) 
Both terms in parentheses vanish if we choose 



A = 2— . (17) 

a 

The scale factor fi in the exponent is determined to be \i = 2 P/Na by normalization. The solution 
to Eq. (fl4l) is just an exponential function which was obtained by Gueron and Levin by means of 
a Laplace transform. 

We notice here a remarkable curiosity: If we take the actual solution of Eq. (fl4|) . then for any 
s the following equalities hold exactly: 

L F (s) = G c (s), L c (s) = G F (s). (18) 

This is in effect the detailed balancing. In other words, the following holds for the G-L model: 
The average loss of clusters of size s due to the cluster fragmentation, is equal to the average gain 
obtained from the coalescence of clusters of sizes smaller than s. Also the average loss of clusters 
of size s due the coalescence with other clusters is equal to the average gain obtained from the 
fragmentation of clusters of sizes larger than s. 

In addition to its mathematical interest, this identity (which is not satisfied for the E-Z model 
as discussed below) shows up a fundamental feature of the G-L model, which arises in turn from 
the microscopic rules which characterize it. This symmetry is also revealed if we look at the be- 
havior of the system with time flowing backwards. (In general, one does not obtain a stochastic 
system by time-reversing the recorded history of a second non-equilibrium stochastic system. Al- 
though this becomes an issue for discrete systems due to the presence of fluctuations, we may still 
discuss it from the perspective of the average quantities describing the equilibrium state). With 
the reversed time perspective, the coalescence of clusters is observed as fragmentation and vice- 
versa, but the average cluster size distribution remains unaltered in the equilibrium state. As far 
as this average quantity is concerned, the system is therefore invariant under an interchange of 
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coalescence/fragmentation processes - and in the specific case of G-L model, the time-reversed 
processes are exactly the same as the original ones. 



C. Cluster size distribution: The exponential cutoff 

We now return to the discrete formulation. For the discrete version of the G-L system, it may 
be verified by direct computation that the steady-state value 

n s = 2— s _1 exp (— a s) (19) 

a 

is also a solution of Eq. (fTT|) . once we make an approximation of extending the summation limits 
to infinity. Here the normalization condition is N = J2^=i s'n(s'), from which we calculate 

, = lng + l). (20) 

Thus we have 

n s = 2^s~' (% + l\ S ■ (21) 

It is advantageous to consider /3/a oc N, thus the exponent is independent of N and n s is just 
proportional to N . If we use here the same constants (Eq. (fT3l) ) as the original E-Z model, the 
solution is 

G-L : n s = N — - — s _1 (1 — u) s . (22) 
1 — v 



The solution to the E-Z model rate equations may be approximated as 



Si 



E-Z : n s ~ Ns~ 2 - 5 ( ^ \' \ . (23) 



4(1 ~ v) 

In order to compare the cluster size distribution for both models, we will for convenience 
characterize both using the same parameters N and v. This means that they will have the same 
coalescence function, and their fragmentation functions will agree for the splitting of clusters of 
size s = 2. The difference between the two models then lies in the fragmentation of larger clusters. 
This allows them to be compared on a similar footing, focusing just on the effect of their respective 
fragmentation functions. 

The cluster size distribution for both models is of the form n s oc s~ K e~^ s . The scale of s at 
which the exponential cut-off becomes relevant, can be identified by looking at the ratio 

!w = e -,(i±ir! =e -,f 1 _« +0 (n). m 

n s s K \ s \ s z J / 
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We assumed that /i C 1 which is the regime in which such models exhibit power-law behavior. 
The exponential cutoff becomes dominant at the scale where a ~ (1 — -), hence we may define 

^cutoff = i ■ — - • (25) 

1 — 

For the models of interest in this paper with //< 1, and therefore v <C 1, we have 

5/2 u\ 2 

G-L : s cutoff = v' 1 , E-Z : s cutoff = - ( — — J « 10z/~ 2 . (26) 

It is clear (see Fig. [3]) that the range of cluster sizes for which one observes the power-law, is 
several orders of magnitude larger for the E-Z model than for the G-L model. We may also verify 
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FIG. 3: Scale of exponential cutoff for the E-Z model (solid curve) and for the G-L model (dashed 
curve) described by the same parameter v. The range of cluster sizes for which one observes the 
power-law, is several orders of magnitude larger for the E-Z model than for the G-L model. 

that the special equilibrium result mentioned earlier for the continuous G-L model (see statement 
in italics) is also a property of the corresponding discrete model, once the upper limits in the sums 
are extended to infinity. It also holds that 

G-L model : L F {s) = svL c {s) , E-Z model : L F (s) = ^-L c {s) . (27) 

We see therefore that for the G-L model we can always find a value of s for which Lp(s) ~ Lc(s) 
- in particular, it is the scale of the cluster size over which the exponential cutoff becomes apparent. 
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FIG. 4: L F (s), loss due to fragmentation for the E-Z model (solid curve) and for the G-L model 
(dashed curve) with parameters v — 0.1 and iV = 1000. The overall scale is determined up to a 
multiplicative constant (i.e. the scale of time). The graphs show that L F (s) for the G-L model is 

usually much larger than L F (s) for the E-Z model. 



By contrast, in the E-Z model for u~ l ^> 1 (i.e. for the wide range over which there is power-law 
behavior) we have L F (s) <C Lc(s) fa Gc(s). If we again compare both models, we find that 
the Lp(s) function is usually much larger for the G-L model than for the E-Z model. Figure |4] 
illustrates this finding for a particular set of parameters. 



D. Reservoir model. Fragmentation into clusters of fixed size 

So far we have looked at the cases in which the total population size N is treated as one of 
the parameters defining the model. Specifically, we considered a constant population such that the 
constraint N = J2^=i sn s(t) strictly holds at every instant in time. This highly idealized situation 
might not be realized in a particular real- world problem - however we can formulate a 'reservoir 
model' version in which the total population size is no longer a parameter defining the model, 
but instead becomes a dynamical variable whose averaged equilibrium value is determined by the 
model itself: N = Yl^Li s i n s{t))- We introduce a constant supply of individuals from a system 
reservoir, with 7 denoting the rate at which single individuals are added. The products of the 
fragmenting cluster are then moved back to the reservoir. An equivalent interpretation is that a 
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cluster stays in the system but ceases to interact (i.e. it does not merge with other clusters). 

Here we discuss the case which in the remainder of the dynamics resembles the terms in the 
E-Z model, with f3s being the rate of removing a cluster of size s and ass' being the coalescence 
rate. This particular reservoir model is therefore described by three parameters a, ft, 7, with only 
two parameters required for the steady-state cluster size distribution. The master equations are 

OO j s— 1 

— (3sn s — a sn s s'n s i + - a s' n s i (s — s')n s - s > + j8 s i = . (28) 

s'=l s'=l 

By summation of Eq. (1281) . the average number of participants is obtained as 

= vu^-p 

a 

The cluster size distribution has the same form as for the E-Z model Eq. (|23l) . if expressed in terms 
of (N) and a/ (3. In this case there is no approximation made in extending the summation limit to 
infinity, and the solution in Eq. (|23l ) is exact from the mean-field theory point of view. There is 
no limit on the maximum size of a cluster, which in principle may exceed (N) when the effect of 
fluctuations is non-negligible. 

We note that the E-Z model changes very little if we consider the case where a cluster fragments 
into a set of smaller clusters, each of fixed size s . For the discrete system, there is naturally a 
divisibility problem regarding fragmentation of clusters of sizes which are not a multiple of so- 
Since we are interested in steady-state behavior, we may assume that such clusters do not fragment. 
Whatever the initial configuration is after a sufficiently long time, the system in equilibrium will 
consist almost entirely of clusters that are a multiple of sq in size. It turns out that the cluster size 
distribution has the same form as the E-Z model in Eq. (T23T) . if we re-express it in terms of s as 
the basic unit, i.e. if we substitute s — > s/sq. 



III. GENERALIZATION OF THE E-Z MODEL 



We now open up the above discussion to a broader class of coalescence-fragmentation models. 



The variety of coalescence-fragmentation-type processes which have been employed to c escribe 



physical, biologica 
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18, 



and social systems in the literature is enormous 



Been employed to < 



11.13. 

H9LI2QD. Here Z model [|2|] given its potential relevance to 

understanding the empirical distributions observed in financial markets and insurgent behavior J4, 



In particular, we will investigate the effect of variations in the rules, and perturbations, 
on the cluster size distribution. 
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A. Spontaneous cluster formation 

Our first generalization mimics the situation in which a small number of clusters are allowed 
to spontaneously form from the population, as opposed to arising from the merger of two smaller 
clusters. In practice this is most simply viewed as the spontaneous formation of clusters from 
previously single agents/clusters of unit size. (The exact mechanism is unimportant). Let 7 S 
represent the rate of formation of clusters of size s by the non-hierarchical method. The value of 
7! is implicitly defined by the requirement that the size iV of the population remains constant, i.e., 
Y^Li s 1s = 0, therefore 71 < 0. The rate equation is given by 

dn x 



-j3sn s 

- a rn r (s — r) n s 

r=l 

m r + 7 S 

for s ^ 2, and 



dt 

s-J 

CV 

2 

r=l 

00 

-n.s'//.. 

r=l 



00 00 



O n 1 , ,, 

-7— = -ani rn r + P r n r + ll 

r=l r=l 



for s = 1. In the steady state this may be written as 

s-l 

a 

2 



A ( - a ^ rn r (s - r) n s _ r + 7 S 



r=l 



where A is defined by 

.4 



(3 + a J2T=i rn r ' 

In deriving these results, we have extended the summation of appropriate low-order terms to in- 
finity by introducing the approximation YlT=i rn r ~ N. The generating function g[y] is now 
introduced: 

00 

9[y] = ^2rn r y r . (30) 

r=2 

Taking the square of this function and using Eq. (1301 ) yields 

= {g[y]f -2\^--n 1 y\g[y] 

+n\y 2 + - X [y], (31) 
a 
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where x[y] = Y1T=2 lrV r • Using the fact that g[l] = J2T=i rn r ~ n \-> gives 

Solving Eq. (I3TT) for general y and expanding the resulting radical using Taylor's theorem yields 



, . . 1 ^ /(2fc-3)!! 



x[2Aa(n iy + A X [y})} k . (33) 



We will assume that the gamma term is small enough to be treated as a perturbation, i.e. <C 
and hence a first-order binomial expansion of the exponential term in Eq. (l33l) may be performed. 
In this case 



g[y] ~ Ax[y] 



j oo 



(2ife-3)!! 



(2fc)!! 



x| [2Ami2/] fc 

. OO s 

+k^- [2Aani] fc j^7ry r+fc - 1 | 



r=2 

Comparing terms with Eq. (|3Q|) yields 

"2 = 2^72 + jAo;(ni) 2 
for s — 2. For large s, Stirling's approximation yields 

n. « - 



(34) 



where X = . Since A is constant for a given population, the general form of the above equation 
is 

n s oc k s s" 5/2 + Z[s}s-\ (35) 

where k = 1 — A 2 {(3 2 + 2aX) and Z[s] is a function whose form depends on the details of the 
perturbation induced by the 7 S terms. 
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B. Step perturbation 



We now analyze a highly simplified example from the class of perturbations which die off as s 
increases. In particular, we consider a step function perturbation: 



7s 



-tj, for2 0^g; 



0. 



for s > q; 



where q is an arbitrarily chosen cluster size and $ > 0. Using the original E-Z parametrization of 
Eq. [13] and Eq.[3H we obtain the cluster size distribution as 

1_$ 



Tlx Ri N 



n 2 



2(2 - u) 



q 



2-i/' 
1 - v 



N 



(2 - z/)e 2 



4v^F(l -v) [(2 - v 



4(1 - „) 



;i-$) 



,-5/2 



-1 



2-i/g- 1 
2 $ 1 



2^(1 - 1 

s-1 r 



r=2 



4(1 ~ ij 
(2 - z/) 2 



;i-$) 



for 3 ^ s ^ g, and 



iV 



z/ e 



4(1 -z/) 



40F(l-z/) L(2-z/) 2 
e 2 $ 1 



4-$) 



-5/2 



2V5F(l-$)?-l 
9 r 4(l-z/ 



E 



r=2 



1 - $) 



[{2-u) 2 

for s ^ g + 1. Examples of the resulting n s distribution are plotted in Fig. [5] 

Interestingly the greatest effect of the perturbation is found at high s, whereas the perturbation's 
definition means that it only directly affects the clustering at low s. This is because the perturbation 
creates small clusters by non-hierarchical means, which then serve as effective nucleation sites for 
the formation of larger clusters. The perturbation therefore greatly accelerates the formation of 
large clusters whereas, by contrast, the small clusters fragment sufficiently fast that their presence 
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FIG. 5: Predicted distribution of cluster sizes for the perturbed system described in Section lTlIBi 
using N = 1000, v = 0.1 and $ = 0.01. The dot-dashed line shows the unperturbed population, 
while the dashed line shows q = 10, the dotted line shows q = 100, and the solid line shows 

q = 1000. 



is hidden on the graph at low s. Figure [6] shows the predicted distribution of n s for different signs 
of the perturbation (±$), together with the unperturbed result. Note that in the case of a negative 
sign, it is necessary that 



$ < 



(36) 



4(1-1/) 

in order that n s remain finite as s — > oo. The analytic predictions for the perturbed populations 
are quantitatively reliable for a wide range of s values. With primed quantities referring to the 
-$ case, and usingiV = 10000, $ = 0.001, v = 0.1 and q = 500, we find that the effect of the 
perturbation is as follows: 

^ = 0.998, 



n 500 



n 



500 
^1000 



0.39, 
0.14. 



n 



1000 



As claimed earlier, this small, low-s perturbation can be seen to have a very significant effect 
across a wide range of s, in particular at high s. We note that the interpretation of the perturbation 
is that statistically a cluster of size 500 or less spontaneously forms/fragments for +/ — $ cases 
respectively once in every 1000 timesteps, where a single timestep corresponds to any particular 
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20 50 100 200 500 1000 

s 

FIG. 6: Predicted cluster size distribution for the +$ case (dotted) and the — $ case B (dashed) 
compared with the unperturbed model (solid). Parameter values: N = 10000; $ = 0.001; 

v = 0.1; q = 500. 



fragmentation or coalescence event in the system. 



C. Variable population size 



Only a small subset of real-world problems correspond to populations with a fixed size N, or 
with a fixed time-averaged size N. In this section of the paper, we develop an analytic treatment 
of a model which is analogous to the E-Z model, but which treats the case of a population whose 
size varies with time according to a simple law. As mentioned earlier, several real-world systems 
seem to have power-law behavior with exponent around 2.5. which is the same behavior as the 
unperturbed E-Z model - for examplej.he distributions of size of trades in markets, and the size of 



attacks in conflict and terrorismP, |4j, |5|, Is 



45 



461] . Such real-world observations could therefore 



conceivably be attributed to the E-Z model - however this identification would be far more believ- 
able if the E-Z model's assumption of constant iV did not have to be made. It is known that as the 
years pass in an active war, an insurgent population will generally increase in size as previously 
passive people become recruited. Likewise as a market grows, previously inactive individuals tend 
to join the trading. Hence a model with increasing N (or decreasing iV for mature wars or markets 
that are dying off) is of interest. Real-world examples of declining populations are also known 

y, m. 
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We now look at a version which can be treated analytically under the assumption that the 
coalescence processes are negligible. Although this makes it arguably more restricted than our 
previous versions, the advantage is that the equation retains linear temporal dynamics and admits 
a novel solution. Including all coalescence terms would make it non-linear, and intractable. 

Our model considers a population containing N[t] agents instantaneously divided into M[t] 
clusters, as in the E-Z model. First we focus on the number of agents increasing in time, and 
introduce the following E-Z- like rules: 

1. In a single timestep, with probability p[t], L[t] new agents are added to a single cluster of 
size s, the cluster being selected with probability proportional to s. 

2. Alternatively, with probability q[t] = l—p[t],a randomly selected cluster fragments (selec- 
tion of this cluster is independent of cluster size). 

If the change in the number of agents is negative, then the model runs as follows: 

1. In a single timestep, with probability p[t], L[t] agents are removed from a single cluster of 
size s, the cluster being selected with probability proportional to s. If the selected cluster 
has s < \L[t] \ then nothing occurs. 

2. Alternatively, with probability q[t] = 1 — p[t], a randomly selected cluster fragments, with 
selection of this cluster independent of cluster size. 

The rationale for adding or subtracting from a single cluster is that in many situations of interest, 
only a single cluster will likely be involved in an external event which changes the population size. 
As with all these generalizations, more realistic rules can of course be explored - but one runs the 
risk of obtaining increasingly complicated results. 



1. Increasing population size: L[t] > 



The model proposed above leads to the rate equations 

dn s pit] r 1N x q\t] 

— = -T7TT (IS — L \t \)n s -L\t\ — Sn s ) TTTT^s 

dt N[t] vv 1 J; m ' M[t] 

for s > L[t], 

dn s pit] q\t] „ r r . 

— = -TTT^sris - 4rhn s for 2 < s < Lit], 
dt N[t] M[t] 

dm p[t] q[t] ^oo f , 

" ~W] ni + W] E - 2rnr ' 
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with resulting totals 

dN 



d/ p[t]L[t], (37) 
dM P ., ( N[t) 



* ^[W]- 1 )- <38) 

The solution of the above equations clearly depends on the forms of L\t\ and p[t]. As a simple 
example, we take both to be constant: L[t] = L and p[t) = p for all t. In this case, it can be seen 
that for times t 3> N ^~ ^ , Eq. (|37l) yields the linear solution 

N[t] = pLt. 

If we assume a similar asymptotically linear form for M[t) at large t, M[t) = at, we can go on to 
deduce from Eq. (1381) that 

We now assume a linear form for all n s : n s [t] = c s t. In this case, one obtains the solution 

T 00 T 



c * = -xTi ^ (1 + kL)ci+kL = aTTi {pL ~ Cl) 

k=i 



Therefore 



pq 

Cl 



a L(l + q/a) + V 
pq V p!W(l + (fc-l)L)!W 

° 1+fcL a L(l + q/a) + 1 (p + ifeL)!W ' 1 ' 



for = 1, 2, 3,. . . , where 



p= (- - 1 ) / I 



m! (n) 



and we have used the multifactorial function, defined by 

1, if ^ m < n; 

m(m — nyS n \ ifm^n. 
Clearly c s = for s ^ 1 + A;L. Via a generalization of Stirling's approximation, 

ln(n\^) ~ — (nlnra — n). 
b 

Applying this to Eq. (1391) , we obtain our solution: 

pq L 2 e (^-i)/V /i (l + (A;-l)L) fc - 1 + 1 / L 

° 1+feL ~ 7L(l+ g / ff ) + l {pTkLY^ 1 ( } 
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for integer k ^ 1. If we take a snapshot of this system at any given time, the observed cluster size 
distribution will be given by Eq. (|4Q|) . modulo a multiplicative constant which grows linearly with 
time. The leading ^-dependent behavior of Eq. d39l) is 

2. Decreasing population size: L[t] < 

For simplicity in the following analysis, we do not allow complete annihilation of clusters (i.e. 
we do not allow the removal of all of a cluster's members from the population). The rate equations 
for L[t] < are as follows: 

dn s p[t] . . , .. \ q[t] 

for s > \L[t]\, 
dn s P[t] ( , , rmh q[t] 

-a = + l []l)ns+im ~W] s 

for 2 < s < \L\t]\, 



and 



^^ (1 + |i[(]IK+wt]l + |l Ernr 



dt N[ty 1 1+l " Lr|1 M[t] r o 

for s = 1, with resulting totals 

diV p[t]\L[t}\ 



oo 



di N[t] ^ rn " (42) 

1 J r=l+\L\ 

- ?[*] 77^-1 ■ (43) 



dt ^ L1 \M[t] 

As above, we can obtain a solution by assuming that p and L are both constant, and then introduce 



a linear trial solution of the form 



N[t] = No-yt, 
M[t] = M + at, 
n s [t] = C s - c s t. 
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This approximation can only hold as long as the changes in each n s are small compared to the size 
of the respective C s . In this case (i.e., for t not too large) we obtain 

fN 1 



M 



p\L 

1 ~ 



No ^ ■~ n 

U r=l+|L| 



and for the c s we obtain: 



c, « + |L|)C 1+|I| - X (JVb - £7.) 

for s — 1, 

c * ~ XT* 7 ' - I" ( s + l L D for 2 O ^ 

i + i s ) c --i (s+|L|)c ^ 

for s > \L\. 

With a suitable choice of initial conditions and a large population, one can therefore infer the 
small-t behavior of the system. 



3. Decreasing population: proof of concept 

As a simple example, we take L < and a starting population of the form 

d-(f)s, ifs<^; 
0, ifs>3»-. 
In this case our equations from Section UlI C 21 yield 



n s [t = 0] 



N = \c x 
o 
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FIG. 7: Predictions of the model of Section lTlIC 31 using parameter values C\ = 1000, 4> = 14, 
p = 0.3 and L = —30. This yields population size parameters of iV = 850173 and M = 35214. 
Line styles reflect different values of the parameter t: dot-dashed (t = 0), dotted (t = 5000), 
dashed (t = 10000) and solid (t = 15000). Beyond t = 15000 it can be seen that the 
approximations made in the derivation of Section lTlI C 2l become inaccurate. 



This leads to an expression for ni of the form 



ni[t] « d + 



-^(1 + |L|)(CW(1 + |£|)) 
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with corresponding n s of the form 



n s [t] « C x -(j)s- 



Q r \ L \P r 
M JV 



P 



9 



— C 2 \L\ + —<\> I S + —(f)S 



P jlJ2 



M T J N 



for 2 ^ s ^ ILL and 



n s [£] w Ct-^s- 

'2 \L\ p q 



JLr - & 

M 1 JV 



C, 



N M 

for s > \L\. Figure [7J shows a plot of this model using illustrative parameter values. 



26 



D. Heterogeneity of members 

In many real-world systems - in particular, biological or social systems - the population is het- 
erogeneous. In addition to the basic question of whether approximating a heterogeneous system 
by a homogeneous model is justifiable, there is the deeper issue of how to formally introduce het- 
erogeneity into coalescence-fragmentation systems. As we have seen in this paper, small changes 
in coalescence-fragmentation rules can sometimes yield dramatic changes in the cluster size dis- 
tribution, and vice versa. In other words, the 'devil may be in the detail' in terms of the emergent 
phenomena that can be expected from a given set of microscopic rules. Our limited goal here is to 
explore some encouraging developments in this area, highlighting the circumstances in which the 
heterogeneity of the population allows an accurate description in terms of an effective homogenous 
model. 



Reference 111811 introduces a 'character' to each object by means of an m-dimensional nor- 
malized vector which is formed from m-bit binary strings. The scalar product of any two such 
characters then becomes the argument of a function which controls the coalescence and fragmen- 
tation processes. The general case requires numerical simulation. Interestingly, however, this 
model produces a power-law over part of its range with a 2.5 slope which is identical to the ho- 
mogenous E-Z model. Instead of the power-law exponent, it is the form of the exponential cut-off 
which turns out to depend on the heterogeneity of the population. We recently explored another 
type of heterogeneous E-Z-like model, showing that it can bridge the gap between the power-law 
slope of magnitude 2.5 for clusters in the E-Z model (and hence 1.5 for price returns) and the em- 



pirical value of financial market price returns which is typically closer to 4 ||55|l . A simple version 



of the vector model is provided via a fascinating recent variation proposed by Hui[52D in which 
the heterogeneity is represented by a character parameter p k 6 [0, 1] which is assigned to each 
object in the entire population, where objects are numbered by k = 1 . . . N. The probability that 
an agent i and another agent j form a link (and therefore for the inequivalent clusters to which 
these members belong to merge) depends on the value \pi — Pj\. In principle it may be a general 
symmetric function p{pi — Pj). The fragmentation of a cluster may also depend on the characters 
of the members that form the particular cluster. One way of introducing this is by a mechanism 
in which fragmentation of the whole cluster is triggered by breaking any single link that belongs 
to it [|520. Since a weaker link is easier to break, it is assumed that the probability that the link 
breaks is proportional to p(pi — pj) which may be interpreted as a measure of the strength of the 
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link formed between members i and j. If p(pi — pj) is a function which is sharply peaked at 0, 
we will have a situation where the newly formed clusters consist only of members of very similar 
character, and the whole system may be considered as a mixture of several homogeneous popula- 
tion subsystems which do not interact which each other. Each of these subsystems is described by 
the cluster size distribution of the form in Eq. (1231) with constants determined by the distribution 
of characters across the population. The cluster size distribution for the whole system (regardless 
of the character) is then a sum of the distributions for the subsystems - therefore we still observe 
a scale-free behavior with variation in the form of the cut-off (i.e. diversity in the heterogeneity of 
the population induces diversity in the constants describing the subsystems, and hence lengthens 
the tail of the cluster size distribution). In the opposite limiting case, the function p(pi — pj) does 
not vary sharply over its argument, e.g. p{pi — pj) oc 1 — \pi — pj\, thereby yielding homogeneous 
mixing. The the distribution of characters across different clusters is uniform and the system can 
therefore be described as an effectively homogeneous one by Eqs. (fT2l) and (|23l) . The presence of 
the heterogeneity changes only the value of a/ (3 in Eq. (fT2)) . 



IV. CONCLUSIONS AND IMPLICATIONS 



We have examined various coalescence-fragmentation systems, with the goal of elucidating 
how subtle changes in their underlying rules can affect the resulting distribution of cluster sizes. 
In the process, we have managed to connect the rules of coalescence and fragmentation with terms 
in the corresponding rate equations, and have identified the specific ways in which they affect the 
resulting distribution of cluster sizes. The connections are not always direct, but we have offered 
various insights which help establish a more direct link. In each case studied, the system senses the 
fragmentation function in two ways: the appearance of new clusters coming from the fragments 
of the fragmented cluster (represented by Gf(s)), and the disappearance of clusters that fragment 
(represented by L F (s)). 

As a result of our analysis, we can better understand what factors dictate when a power-law is 
likely to emerge, and what tends to control its exponent. We conclude that: (1) it is the substantial 
contribution of L F (s) in the equilibrium condition (Eq. Q]) which may prevent the size distribution 
from showing a power-law behavior. (2) The presence or absence of Gf(s) (i.e. the appearance of 
fragmentation products of new clusters) influences strongly the value of the power-law exponent 
itself, in cases where the power-law emerges. In the case where the parameter controlling the 
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fragmentation is small but finite, it is hard to identify a common limiting case for the various 
systems studied - however, the form of the fragmentation function does influence the cluster size 
distribution regardless of the value of this parameter. Note that if the fragmentation rate tends 
to zero, the system cannot be clearly described using mean-field theory, since it performs quasi- 
oscillatory behavior associated with the build-up of one supercluster containing essentially the 
whole population, and this supercluster's eventual break-up. Whatever the mode of fragmentation, 
the exponent of the power-law may be controlled by altering the power of the cluster size s which is 
involved in the fragmentation and coalescence function. Specifying it realistically requires some 
detailed understanding of the system at the microscopic level. The most common mechanism 
of coalescence is created by building random links between the population members, yielding a 
coalescence function of the form ~ ss'. 

If we adopt a point of view in which the system is considered as an evolving network, the 
clusters represent disconnected components. Depending on the particular rules, the fragmentation 
process now corresponds to breaking links. If the disconnected component in a network breaks 
predominantly into single members, it might be still interpreted in terms of the fragmentation 
being triggered by a single member, provided we allow some kind of link-breaking virus to spread 
rapidly throughout the entire disconnected component. Somewhat counter-intuitively, we have 
also seen that the behavior of the heterogeneous system does not substantially differ from the 
behavior of the homogeneous one. This results from two effects: the homogeneous mixing effect, 
and the coexistence of several non-interacting populations whose distinct 'characters' lie hidden 
in the cluster size distribution. 

Although we have mentioned various possible applications, we finish by noting a new one. 
Many of the neurodegenerative disorders associated with aging, for example Alzheimer's disease, 
are thought to be associated with the large-scale self-assembly of nanoscale protein aggregates 
in the brain [20]. Protein-aggregation has of course attracted much attention over the years in 
both the chemistry and physics literature - however, the problem of protein aggregates in neu- 
rodegenerative diseases is known to be much harder than traditional polymer problems, because 
of the complexity of the individual proteins themselves [20]. Given the wide range of possible 
heterogeneities in vivo within a cell, there is typically insufficient knowledge to specify either 
(i) a specific diffusion model and its geometry and boundary conditions, as a result of geomet- 
rical restrictions and crowding effects Il53n . or (ii) a specific reaction model for the binding rates, 
given the wide variety of conformational states in which molecules may meet. It therefore makes 
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sense to assign some probabilities to the aggregation process - and in particular, coalescence and 
fragmentation probabilities to describe the joining of an n-mer with an n'-mer to give an n"-mer, 
where {n, n', n"} = 1, 2, 3, and its possible breakup. The precise details of the coalescence 
and fragmentation rules now takes on a critical importance, since subtle changes in these rules 
can alter the resulting size distribution of the n-mer population. The practical question of how 
fatal a given realization of the disease will be in a particular patient, becomes intertwined with the 
question of whether the distribution of cluster sizes is a regular one in terms of its fluctuations - 
e.g. a Gaussian or Poisson distribution which both have a finite variance - or it is a power-law 
which may then have a formally infinite variance. Although in practice a cut-off always exists, 
a power-law with an exponent a < 2 has (in principle) an infinite mean and infinite standard 
deviation; a power-law with 2 < a < 3 has (in principle) a finite mean but an infinite standard 
deviation; and a power-law with a > 3 has a finite mean and finite standard deviation. The im- 
plication is that a coalescence-fragmentation process producing a power-law with a < 3 as in 
E-Z-type models where a ~ 2.5, has a significant probability of forming very large n-mers be- 
cause of its (in principle) infinite standard deviation. Suppose for the moment that an n-mer of 
size n > n can produce a neurodegenerative disorder, then the fraction of such dangerous n-mers 
in a soup of self-assembling polymer aggregates, will be non-negligible if a < 3. In the highly 
crowded, heterogeneous n-mer population expected in the human body, the resulting value of any 
approximate power-law slope a could therefore be a crucial parameter to estimate. The possibility 
of engineering this a value such that large aggregates are unlikely, through subtle changes in the 
coalescence and fragmentation processes, then takes on a very real possibility. It also adds direct 
medical relevance which justifies further work on this topic in the future. 
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